Human and pathogen genotype-by-genotype interactions in the light of coevolution theory

Antagonistic coevolution (i.e., reciprocal adaptation and counter-adaptation) between hosts and pathogens has long been considered an important driver of genetic variation. However, direct evidence for this is still scarce, especially in vertebrates. The wealth of data on genetics of susceptibility to infectious disease in humans provides an important resource for understanding host–pathogen coevolution, but studies of humans are rarely framed in coevolutionary theory. Here, I review data from human host–pathogen systems to critically assess the evidence for a key assumption of models of host–pathogen coevolution—the presence of host genotype-by-pathogen genotype interactions (G×G). I also attempt to infer whether observed G×G fit best with “gene-for-gene” or “matching allele” models of coevolution. I find that there are several examples of G×G in humans (involving, e.g., ABO, HBB, FUT2, SLC11A1, and HLA genes) that fit assumptions of either gene-for-gene or matching allele models. This means that there is potential for coevolution to drive polymorphism also in humans (and presumably other vertebrates), but further studies are required to investigate how widespread this process is.


Introduction
Population genetic analyses of humans as well as other organisms have shown that immune genes and other genes at the host-pathogen interface are often highly polymorphic. Moreover, many of these polymorphisms are associated with susceptibility to infectious and inflammatory/autoimmune disease and have therefore likely been subject to natural selection [1,2]. Natural selection is normally expected to eliminate genetic variation, so why are immune genes then so variable?
A popular idea is that the high level of polymorphism is a result of host-pathogen coevolution driven by negative frequency-dependent selection (NFDS; Box 1), often referred to as "Red Queen dynamics" [3,4]. This can occur because infection typically requires pathogen binding to some host molecule to gain access to tissue and/or pathogen evasion of immune recognition to avoid clearance. Regardless of the type of molecular interaction, pathogens should evolve to infect common host genotypes that are then selected against and decline in frequency, followed by pathogen adaptation to alternative host genotypes which are then selected against. Such persistent NFDS as a result of continuous pathogen adaptation to the currently most frequent host genotype can lead to the maintenance of 2 or more alternative alleles for long time periods at the loci involved.
The finding that genes at the host-pathogen interface in humans and other organisms often have signatures of balancing selection [5][6][7] is clearly consistent with the idea of coevolution by NFDS. However, balancing selection on such genes could also be a result of other forms of pathogen-mediated balancing selection, like heterozygote advantage or spatiotemporal heterogeneity in pathogen abundance driven by environmental factors [3]. None of the latter processes involve reciprocal selection for adaptation and counter-adaptation as in coevolution; instead, they represent unidirectional selection by pathogens on the host.
In invertebrates and plants, "time-shift experiments"-where hosts are exposed to pathogens from the past, present, and future-have demonstrated that coevolution by NFDS indeed plays a role in natural populations [8,9]. However, such experiments are difficult to perform on vertebrates, and there is little other evidence that balancing selection in vertebrates is a result of host-pathogen coevolution by NFDS. Moreover, even if NFDS in principle is a very powerful driver of polymorphism, theoretical models have shown that it only occurs in a quite narrow parameter space [10]. Thus, it is relevant to ask: How important is coevolution, with continuous adaptation and counter-adaptation of host and pathogen, as a driver of polymorphism in vertebrates?
A good way to start investigating the role of coevolution by NFDS in vertebrates is to test assumptions that are specific to models of host-pathogen coevolution. The key assumption of classical models of host-pathogen coevolution by NFDS is that infection depends not only on genetic variation in host and pathogen, but also on the combination of host and pathogen genotypes. Thus, in statistical terms, there needs to be a host genotype-by-pathogen genotype interaction (G×G) for susceptibility to infection [3,4,11].
There are 2 basic types of models of host-pathogen coevolution, with different types of G×G; "matching allele" (MA) and "gene-for-gene" (GFG) models [10,12]. Briefly, MA models assume G×G such that different host genotypes are susceptible to different pathogen genotypes, while GFG models assume G×G such that host genotypes differ in the range of pathogen genotypes they are susceptible to. Both scenarios can lead to coevolution by NFDS and the long-term maintenance of polymorphism, but the GFG scenario will only do so if there is a cost of resistance (see Fig 1 for details). Thus, testing for G×G and investigating the nature of G×G provides a key to understanding host-pathogen coevolution.
There are numerous studies demonstrating G×G in plant and invertebrate host-pathogen systems (for examples, see [13,14]), but explicit tests for G×G in vertebrates have been scarce

Box 1. Glossary
Host-pathogen coevolution: a form of antagonistic coevolution, where there is reciprocal selection for adaptation and counter-adaptation in 2 species that affect each other's fitness negatively.
Negative frequency-dependent selection: when the fitness of an allele is negatively correlated with its frequency (direct NFDS) or the frequency of an allele at another locus (indirect NFDS). In case of host-pathogen coevolution, there needs to be indirect NFDS in the sense that the fitness of a host allele depends on the frequency of the pathogen allele with which it interacts [3,10]. [15]. However, during the last decade, several genome-wide tests for G×G in human hostpathogen systems have been published. Here, I systematically review the evidence for G×G from these studies, as well as candidate gene analyses, and evaluate the implications for our understanding of the importance of coevolution between pathogens and humans (and vertebrates in general) as a cause of balancing selection. I focus on the following questions: For Coevolutionary consequences of different types of G×G. For simplicity, the figure illustrates a scenario where both host and pathogen are haploid and where the G×G involves 1 host locus and 1 pathogen locus (each with 2 different alleles). In MA models, there is a G×G such that different pathogen genotypes infect different host genotypes (a)AU : PleasecheckwhetherthechangesmadeinFig1captionarecorrect: . MA models readily lead to NFDS and the long-term maintenance of polymorphism at interacting loci in both host and pathogen, either in the form of cyclic allele frequencies or a stable polymorphism (b). This occurs because resistance to 1 pathogen genotype comes with a cost in the form of susceptibility to other pathogen genotypes. In other words, under the MA scenario, there is a trade-off between resistance to different pathogen genotypes. In GFG models, there is a G×G such that some pathogen genotypes infect a wider range of genotypes than others (c). In the basic GFG scenario, there is no cost of host resistance or pathogen infectivity. When a host allele that improves resistance without any costs (to the host) occurs in a population, it will be favoured by selection and driven to fixation. Similarly, when a pathogen allele that improves infectivity without costs (to the pathogen) occurs, it will go to fixation. GFG models without costs of resistance or infectivity therefore lead to selective sweeps with only brief, transient polymorphisms, often referred to as arms race coevolution ((d); note that successive sweeps often occur at different sites in the genome, as indicated by different types of lines). However, if there is a cost of host resistance in the currency of another trait related to fitness so that no host genotype has highest fitness under all conditions (and a cost of pathogen infectivity so that no pathogen genotype has highest fitness under all conditions), also GFG models can lead to coevolution by NFDS and the longterm maintenance of polymorphism in the same way as matching allele models (b) [12]. Note that different types of molecular interactions between host and pathogen can result in both MA and GFG type G×G (see [11]). Whether NFDS results in cycles or stable polymorphism (b) depends on the relative importance of 2 different types of NFDS; direct NFDS (where the fitness of an allele is negatively correlated with its frequency) and indirect NFDS (where the fitness of an allele in the host is negatively correlated with the frequency of an allele at the locus involved in G×G in the coevolving pathogen) [10]. Based on figures in [4,11]. GFG, gene-for-gene; G×G, host genotype-by-pathogen genotype interaction; MA, matching allele; NFDS, negative frequency-dependent selection.
https://doi.org/10.1371/journal.pgen.1010685.g001 which human genes is there evidence of G×G? Are these G×G of MA or GFG type? Are there other types of costs (e.g., risk of autoimmune disease) associated with genes involved in G×G (which could help maintain polymorphism in case of GFG type G×G)? Do genes involved in G×G show signatures of balancing selection (as would be expected if they are engaged in coevolution by NFDS)?

Literature search
Studies of G×G in humans have not used consistent terminology (for example, the term genotype-by-genotype interaction or similar is rarely used in the literature on humans), so it is difficult to perform a focused literature search with narrow search terms. Instead, I identified relevant papers by a combination of broad reading of the literature (particularly review papers of genetics of susceptibility to pathogens in humans) and a literature search with broad search terms (Box 2). I selected studies showing G×G for any infection-related trait; thus, not only analyses of susceptibility to infection (which is the trait that is traditionally the focus of models of coevolution), but also studies using disease severity, pathogen load, immune escape mutations, etc., as outcome. I only considered natural genetic variation, so studies of genetically modified pathogens or human cell lines were excluded.

Study types and prevalence of G×G
I found evidence for G×G in 10 human host-pathogen systems, including protozoan, bacterial, and viral pathogens (Table 1). Evidence for G×G comes from several different types of studies, from epidemiological analyses to in vitro assays. Moreover, G×G were detected in several different ways. Several studies tested for G×G between 1 or several host candidate genes and pathogen strains. Another approach, employed in some of the most recent studies, is genome-wide testing for G×G in both host and pathogen, that is by performing genome-wide SNP typing of Box 2. Literature search I first identified relevant papers by broad reading of the literature, in particular review papers of genetics of susceptibility to pathogens in humans; this yielded a first set of 13 papers showing G×G, involving 8 different pathogens. To find more papers, I performed a literature search in Web of Science Core Collection in Dec 2022. To this end, I extracted key words from the titles and abstracts of the first set of papers and constructed a query with relatively broad search terms, but which still yielded a manageable number of records [Topic = human AND (genetic varia � OR polymorph � ) AND (bacteria � OR viral OR virus OR parasite OR pathogen) AND (interact � OR interplay OR "genome to genome"), which yielded approximately 3.900 records]. By scanning titles and abstracts of these records, I identified papers that considered genetic variation of both host and pathogen; these papers (approximately 1% of the records) were examined in detail (both original results and cited references). In the end, this literature search produced 15 additional papers with evidence for G×G. Most of these concerned pathogens and/or human genes already included in the first set of papers; the list of pathogens and host genes involved in G×G should thus be reasonably complete.  [17]. See also [30].
(Continued ) both host and pathogen and then testing for G×G between all pairs of host and pathogen SNPs, referred to as "genome-to-genome" analysis [16]. Other studies used various combinations of candidate gene analysis, pathogen strain identification, and genome-wide analyses.
To gain insight into how common G×G are it is useful to focus on studies based on genome-wide analyses of humans (genome-to-genome studies and genome-wide tests for interactions with pathogen strains), as they should provide a more unbiased estimate of the occurrence of G×G than candidate gene studies. Genome-wide analyses have been performed with viral and bacterial pathogens. All 3 genome-wide tests for G×G with viruses found evidence for G×G [17][18][19]. Most of these concern immune escape mutations, the only exception being [18], which also found G×G for viral load. Of the genome-wide analyses for G×G with bacterial pathogens, 2 studies found statistically significant G×G [20,21] while 1 did not [22]. In all studies where G×G were found, only 1 or a few host loci were involved. Thus, the currently available data indicate that G×G occur in most host-pathogen pairs, but that at most a few host genes are involved in each pair. It should be noted, though, that the multiple testing burden is considerable in genome-wide tests for G×G [16], so future studies with higher power may reveal that a larger number of host loci are often involved in G×G in each hostpathogen pair.

For which genes is there evidence of G×G?
Human genes with evidence for G×G include some genes that are textbook examples of associations with susceptibility to infectious disease, such as the MHC class I genes (HLA-A, HLA-B, HLA-C; G×G with, e.g., Plasmodium falciparum and HIV) and the blood group antigen gene ABO (G×G with Helicobacter pylori and Vibrio cholerae). A recent study also found G×G between HBB (encoding the hemoglobin β subunit)-which has well-known effects on susceptibility to malaria-and P. falciparum [23]. Specifically, the protective effect of the HbS variant at HBB was found to depend on the genotype at 3 different loci in the P. falciparum genome, with all 3 loci in strong linkage disequilibrium such that the minor alleles occur together. In addition, genes with evidence for G×G include some canonical immune genes (e.g., TLR2, IFNL4, KIR2DL2, CD209) and other genes at the host-pathogen interface with well-documented associations with susceptibility to infection (FUT2, SLC11A1), but also several genes that are not previously recognised in this context (e. g., FARP1, STK32C, UNC5D).

Are these G×G of MA or GFG type?
None of the studies put their results in the context of MA versus GFG. I therefore inferred the type of G×G based on the published data. G×G were detected for several different infection phenotypes, including both binary (e.g., infection status) and continuous traits (e.g., pathogen load or disease severity in infected individuals). For case-control studies of infection status or other binary disease phenotypes, where it is possible to calculate host genotype odds ratios (OR) separately for each pathogen genotype, this information can be used to distinguish MA and GFG. To see this, consider the simplest case where there is G×G between a pair of loci that are bi-allelic in both host and pathogen, as in Fig 1. If there is a trade-off between resistance to different host genotypes as in the MA scenario (Fig 1A), host genotype should be associated with both pathogen genotypes, but in opposite ways. Thus, the OR should be >1 for one pathogen genotype but <1 for the other (note that none of the ORs need to be significantly different from 1, but they should be different from each other). In contrast, if host genotypes differ in the range of pathogen genotypes they are resistant/susceptible to, as in the GFG scenario (Fig 1C), host genotype should only be associated with one of the pathogen genotypes. Thus, the OR should be significantly different from 1 for one pathogen genotype but equal to 1 for the other.
Of the case-control studies with evidence for G×G for infection status or other binary disease phenotypes, 8 present pathogen genotype-specific ORs, with 1 to 3 host genes involved in G×G with each pathogen [20,[23][24][25][26][27][28]. In all but 1 case, the OR is significantly different from 1 for one pathogen genotype but not for others. Thus, in the majority of cases the pattern is most consistent with GFG type G×G. Perhaps the most striking example is HBB and resistance to malaria [23]. Here, HbS is protective against severe malaria if an individual is infected with a parasite having the major allele at all 3 loci involved in G×G (OR�0.02) but not when infected with parasites having the minor allele at all 3 loci (OR�1) (based on data in Fig 2 of [23]). Similar differences between host genotypes in the range of pathogen strains they are susceptible to (but without any indication of a trade-off between resistance to different strains) are seen with for example ABO (with H. pylori and V. cholera) [27,29] and FUT2 (with Norovirus) [25]. The only indication of MA type G×G in case-control studies concern HLA class II and risk of cervical cancer caused by human papilloma virus (HPV), where different HLA haplotypes affect susceptibility to different HPV types [28].
There are also several studies that have performed analyses of associations between pathogen and host alleles in chronic viral infections [17][18][19][30][31][32][33]. Such G×G are generally interpreted as being a result of within-host evolution of immune escape, although they could also reflect differences in susceptibility to infection with viruses carrying different alleles at the start of the infection. These studies have primarily found G×G involving HLA genes. It is generally difficult to infer whether these G×G are of MA or GFG type, because specific HLA alleles are often associated with escape mutations at several positions in the viral genome. However, one of the studies of HIV found that different HLA alleles were associated with different amino acid escape mutations at a particular position [31], a pattern clearly indicating a trade-off between resistance to different pathogen genotypes; thus in at least some cases G×G for escape mutations are consistent with the MA scenario.
Besides studies based on epidemiological analyses of presence/absence of infectious disease or immune escape mutations, there are also some studies finding G×G for various continuous infection-related traits like pathogen replication and disease severity [18,21,32,[34][35][36][37]. Given that the trait is associated with both host and pathogen fitness, also G×G affecting such traits could lead to coevolution. A study using an in vitro assay of HIV replication found that NK cells with at least 1 copy of the KIR2DL2 allele inhibit replication of 1 specific HIV genotype, while NK cells without KIR2DL2 have limited inhibitory effect regardless of HIV genotype [34], a pattern consistent with a GFG type G×G (Fig 2A). Similarly, a study analysing effects of HIV escape mutations on viral load showed that certain virus genotypes had reduced viral load in individuals carrying a specific MHC allele while there was no effect of virus genotype on viral load in individuals without that allele; also this pattern appear consistent with the GFG scenario (Fig 2B) [32]. In contrast, an analysis of tuberculosis patients showed that SLC11A1 genotype had opposite effects on disease severity depending on Mycobacterium tuberculosis lineage [35], consistent with an MA type G×G (Fig 2C). Overall, 4 of the 7 analyses of continuous traits showed results consistent with GFG type G×G, while 3 are consistent with MA type G×G (Table 1).
Finally, in vitro functional analyses of the ability of different Helicobacter pylori isolates to bind host receptors showed that most isolates are generalists and bind both A and H antigen (from individuals with blood group A and O, respectively) while a significant fraction of strains in South America are specialists and bind only H antigen [29], consistent with GFG type G×G.

Are there other types of costs associated with genes involved in G×G?
Since several of the G×G appeared to be of the GFG type, it would be of interest to know if the genes involved are associated with other types of diseases that could lead to the fitness cost necessary to generate NFDS and help maintain polymorphism in case of a GFG type G×G (Fig 1).
To identify potential costs of alleles conferring resistance to a particular pathogen genotype, I searched the GWAS catalog [38] (either directly or via LDtrait at LDlink [39] to check if SNPs involved in G×G were in linkage disequilibrium with SNPs associated with other diseases) and PheWAS Resources (HLA genes) [40] for disease-associations with genes involved in G×G.
For several of the genes with GFG type G×G, there is indeed strong evidence for costs associated with the allele that confers resistance to a subset of pathogen genotypes. For example, the nonfunctional FUT2 allele, which protects against some Norovirus strains, is also associated with Crohn's disease and other diseases [41,42]. Similarly, KIR2DL2, which inhibits replication of a specific HIV genotype [34], is associated with several autoimmune diseases [43]. Overall, costs are known for about two thirds of the genes with indication of GFG type G×G (Table 1). Interestingly, there is also evidence for costs of resistance in case of SLC11A1, which is one of few genes showing clear MA type G×G (where costs are not necessary to generate NFDS; Fig 1). Here, high and low expression alleles are associated with susceptibility to autoimmune and infectious disease, respectively [44].

Do genes involved in G×G show signatures of balancing selection?
If the G×G identified in humans really lead to coevolution by NFDS, one would expect the genes involved to exhibit signatures of balancing selection that can be detected by analyses of population samples of DNA sequence data [45]. For 12 of the 20 genes involved in G×G, there are such signatures of balancing selection, based on genome-wide scans or candidate gene analyses (Table 1). Most show signatures of long-term balancing selection, in some cases-for example, ABO-in the form of "trans-species polymorphisms," meaning that the polymorphism has been maintained by selection in primates for tens of millions of years [46]. An exception to the trend for long-term balancing selection is HBB that shows a signature of recent positive or balancing selection (for recent selection the signatures of positive and balancing selection are indistinguishable) [47].

Conclusions
The present review has shown that several human genes are involved in G×G, as assumed by models of host-pathogen coevolution. Most of the G×G seem to fit the GFG rather than MA scenario, particularly for case-control studies of infection status and other binary disease phenotypes, which means a cost of resistance is required for these G×G to lead to maintenance of polymorphism by NFDS. Such costs are known for at least some of the genes with evidence for G×G. Taken together, this shows there is scope for coevolution by NFDS also in vertebrates. These conclusions come with several caveats, though.
First, for G×G to result in coevolution, the phenotypic trait concerned must be associated with both host and pathogen fitness. While most studied traits (Table 1) clearly can affect host fitness, the relevance for pathogen fitness is doubtful in some cases, for example, meningitis in Streptococcus pneumoniae infection [20] and risk of cervical cancer caused by HPV [28]. Second, in case of chronic viral infections (HIV, HCV, and EBV), the G×G are thought to be a result of within-host evolution of immune escape, and it is not always clear if these G×G also affect some aspect of host fitness, such as susceptibility to infection or severity of disease, as would be required for coevolution to occur. However, a recent study of HIV found that at least some of the immune escape mutations led to G×G for viral load [32], indicating that G×G involving immune escape mutations might indeed affect host fitness. Third, inferring if G×G are of GFG or MA type from currently segregating host and pathogen alleles might be misleading. For example, what is actually an MA type G×G might appear to be a GFG type G×G if rare alleles are not sampled [11,48]. Fourth, the preponderance of GFG type G×G in case-control studies of binary disease phenotypes might be an artefact of that these analyses are based on separate analyses of each pathogen strain and only report host polymorphisms where the OR is different from 1 for at least one of the pathogen strains. Thus, these analyses will miss MA type G×G where the OR for 2 pathogen strains are in opposite directions and different from each other, but none is different from 1. Even with these caveats in mind, there are some strong cases for coevolutionarily relevant G×G of both GFG and MA type (GFG: e.g., HBB, ABO, FUT2, and HLA genes; MA: e.g., SLC11A1 and HLA genes).
The G×G for HBB illustrates that different types of pathogen-mediated balancing selection can act on a given gene simultaneously. HBB is the textbook example of heterozygote advantage, where individuals with 1 copy of the HbS variant have improved resistance to malaria, whereas HbS homozygosity leads sickle cell disease [49]. The finding that HBB is involved in a G×G with P. falciparum shows that there might also be NFDS on this gene. It is often expected that several different types of pathogen-mediated balancing selection operate simultaneously on a given gene and HBB is perhaps the clearest evidence yet that this is the case.
The G×G for ABO and the HLA genes illustrate another aspect of pathogen-mediated balancing selection-that a given gene might be coevolving with more than 1 pathogen simultaneously, so called "diffuse coevolution" [3]. Diffuse coevolution is expected to be common, perhaps the norm, but ABO and the HLA genes are as far as I am aware the first cases where specific genes have been shown to be involved in G×G with 2 or more different pathogens, thus demonstrating that there actually is opportunity for diffuse coevolution.
In conclusion, there is some evidence from humans for G×G, a key assumption of models of host-pathogen coevolution by NFDS (but not other types of pathogen-mediated balancing selection). This indicates that balancing selection on genes at the host-pathogen interface in humans (and other vertebrates) could indeed be a result of coevolution, as is commonly assumed. Nevertheless, more studies testing for G×G are clearly desirable, in particular, genome-to-genome studies as they give an unbiased perspective on which genes are involved. Recent development of statistical approaches should facilitate this [50,51]. Still, it is important to recognise that the presence of G×G only shows that there is opportunity for coevolution by NFDS, not that it has occurred. Demonstrating that polymorphism is a result of coevolution would require additional analyses. One way would be to test if there is NFDS. Specifically, balancing selection by antagonistic coevolution requires indirect NFDS, i.e., the fitness of a host allele should be negatively correlated with the frequency of a pathogen allele. This could be tested by following 1 or more populations over time. Advances in the analysis of ancient DNA from both mammals and pathogens should make this possible even for humans and other species with long generations times [52]. Nevertheless, identifying the genes involved in G×G-as described in this review-would be a critical first step.